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Abstract — This paper introduces a statistical method to decide 
whether two blocks in a pair of images match reliably. The 
method ensures that the selected block matches are unlikely to 
have occurred "just by chance." The new approach is based 
on the definition of a simple but faithful statistical background 
model for image blocks learned from the image itself. A theorem 
guarantees that under this model not more than a fixed number 
of wrong matches occurs (on average) for the whole image. 
This fixed number (the number of false alarms) is the only 
method parameter. Furthermore, the number of false alarms 
associated with each match measures its reliability. This a 
contrario block-matching method, however, cannot rule out false 
matches due to the presence of periodic objects in the images. But 
it is successfully complemented by a parameterless self-similarity 
threshold. Experimental evidence shows that the proposed method 
also detects occlusions and incoherent motions due to vehicles and 
pedestrians in non simultaneous stereo. 

Index Terms — Stereo vision. Block-matching, Number of False 
Alarms (NFA), a contrario detection. 



I. Introduction 

Stereo algorithms aim at reconstructing a 3D model from 
two or more images of the same scene acquired at different 
angles. This work only considers previously stereo-rectified 
image pairs. In that case the 3D reconstruction requires that the 
matched points in both images belong to the same horizontal 
epipolar line. The matching process of stereo image pairs 
has been studied in depth for more than four decades. ||T| 
and |[35l contain a fairly complete comparison of the main 
methods. According to these surveys there are roughly two 
main classes of algorithms in binocular stereovision: local 
matching methods and global methods. 

Global methods aim at a coherent solution obtained by 
minimizing an energy functional containing matching fidelity 
terms and regularity constraints. The most efficient ones seem 
to be BeHef Propagation CSI, r42J, Graph Cuts 1 16], Dynamic 
Programming [91, ll28ll and solvers of the multi-label problem 
(El, |[30l . They often resolve ambiguous matches by main- 
taining a coherence along the epipolar line (DP) or along and 
across epipolar lines (BP & GC). They rely on a regularization 
term to eliminate outliers and reduce the noise. They give a 
match to all points which are not detected as occluded. Global 
methods are, however, at risk to make or propagate errors if 
the regularization term is not adapted to the scene. A classic 
example is when a large portion of the scene is nearly constant, 
for example a scene including a cloudless sky, since there is no 
information in such a region to compute reliable matches (see 
Fig. [TT] for an example). On such ambiguous regions, global 
methods perform an interpolation by using the informative 
pixels. This interpolation can be lucky, as it is the case in most 



images of the Middlebury benchmarlfl. But it can also fail, as 
is apparent in the above example and in many outdoor scenes. 
Furthermore, the energy in global methods, has at least two 
terms and one parameter weighting them (and sometimes three 
terms and two parameters 1 16|). These parameters are difficult 
to tune, and even to model. Thus, it remains a valid question 
how to rule out by a parameterless method the dubious regions 
where the matches cannot be scientifically demonstrated. 

On the other hand local methods are simpler, but equally 
sensitive to local ambiguities. Local methods start by compar- 
ing features of the right and left images. These features can be 
blocks in block-matching methods, or even local descriptors 
[21 J like SIFT descriptors C3, EH, curves El, corners El, 
1 11 II . etc. The drawback of local methods is that they do not 
provide a dense map as global methods do (meaning that the 
percentage of matched points is lower than 100%). 

Recent years have therefore seen a blooming of global meth- 
ods, which reach the best performance in recent benchmarks 
such as the Middlebury dataset ESI- But our purpose is to 
show that local methods can also be competitive. This paper 
considers the common denominator of most local methods, 
block-matching. It shows that this tool is amenable to a 
local statistical decision rule telling us whether a match is 
reliable. In fact, not all the pixels in an image pair can 
be reliably matched in real scenes (40 to 80% of pixels). 
The lack of corresponding points in the second image or 
the ambiguity in certain points stirs up gross errors in dense 
stereovision. In particular block-matching methods suffer from 
two mismatching causes that must be tackled one by one: 

1) The main mismatch cause in local methods is the 
absence of a theoretically well founded threshold to 
decide whether two blocks really match or not. Our 
main goal here will be to define such a threshold by 
an a contrario block-matching (ACBM) rejection rule, 
ensuring that two blocks do not match "just by chance." 

2) A second minor mismatch cause is the presence on 
the epipolar line of repetitive shapes or textures, a 
problem sometimes called "stroboscopic phenomenon," 
or "self-similarity." The proposed ACBM only rules out 
stochastic similarities, not deterministic ones. While the 
ACBM rule mismatches repetitive patterns, these types 
of mismatches are easily eliminated by a simple self- 
similarity rule (SS). We shall, however, verify that a self- 
similarity rule by itself is far from reaching the ACBM 
performance. Both rules are necessary and complemen- 
tary. 

The elimination of these two sorts of mismatches is a key 
^ http://vision.middlebury.edu/stereo/ 
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issue in block-matching methods. The problem of sifting out 
matching errors in stereovision has of course been addressed 
many times. We shall discuss a choice of the significant 
contributions for each cause of mismatch. 

Occlusions are still an open problem in stereovision and one 
of the main causes of mismatch. For this reason numerous 
stereo approaches focus on detecting them. Global energy 
methods 1 16| address occlusions by adding a penalty term for 
occluded pixels in their energy function. In [14J the major 
contribution is the reasoning about visibility in multi-view 
stereo. [42J computes two disparity maps symmetrically and 
verifies the left-right coherence to detect occluded pixels. 1281 
asserts that if two points in the epipolar line match with two 
points with a different order then there is an occlusion. Again 
this can lead to errors if there are narrow objects in the scene. 
See also 13, which compares a choice of methods to detect 
occlusions. 

Matching pixels in poorly textured regions, where noise 
dominates signal, is clearly the main cause of error. Based 
on local SNR estimates, |5| has proposed to reject matches by 
thresholding the second derivative of the correlation function: 
the flatter the correlation function, the less reliable the match. 
In 1 34 1, the mismatches due to weakly textured objects or 
to periodic structures are considered. The author defines a 
confidently stable matching in order to establish the largest 
possible unambiguous matching at a given confidence level. 
Two parameters control the compromise between the percent- 
age of bad matches and the match density of the map. Yet, 
the match density falls dramatically when the percentage of 
mismatches decreases. We will see that the method presented 
here is able to get denser disparity maps with less mismatches. 
Similarly, |20| tries to eliminate errors on repeated patterns. 
Yet their matches seem to concentrate mainly on image edges 
and therefore have a low density. A more primitive version 
of the rejection method developed here was applied success- 
fully to the detection of moving and disappearing objects in 
ll33l . This is a foremost problem in the quasi- simultaneous 
stereo usual in aerial or satellite imaging where vehicles and 
pedestrians perturb strongly the stereo matching process. The 
extended method presented here deals with a much broader 
class of mismatches, including those due to poor signal to 
noise ratio. 

A. Anterior Statistical A Contrario Decision Methods 

Because of the above mentioned reasons one cannot presup- 
pose the existence of uniquely determined correspondences 
for all pixels in the image. Thus, a decision must be taken 
on whether a block in the left image actually meaningfully 
matches or not its best match in the right image. This problem 
will be addressed by the a contrario approach initiated by |6|. 
This method is generally viewed as an adaptation to image 
analysis of classic hypothesis testing. But it also has a psy- 
chophysical justification in the so-called Helmholtz principle, 
according to which all perceptions could be characterized as 
having a low probability of occurring in noise. Early versions 
of this principle in computer vision are ITtI. ifTOl . ll38l . 

A probabilistic a contrario argument is also invoked in 
the SIFT method ifTSl . which includes an empirical rejection 



threshold. A match between two descriptors S\ and S\ is 
rejected if the second closest match to S\ is actually 
almost as close to S\ as is. The typical distance ratio 
rejection threshold is 0.6, which means that 5*2 is accepted 
if dist{S[^S\) < 0.6 X dist{S'2-,S\) and rejected otherwise. In- 
terestingly, Lowe justifies this threshold by a probabilistic ar- 
gument: if the second best match is almost as good as the first, 
this only means that both matches are likely to occur casually. 
Thus, they must be rejected. Recently, |31 1 proposed a rigorous 
theory for this intuitive method. SIFT matches are accepted or 
rejected by an a contrario methodology involving the Earth 
mover distance. The a contrario methodology has also already 
been used in stereo matching. ll23l proposed a probabilistic 
criterion to detect a rigid motion between two point sets taken 
from a stereo pair, and to estimate the fundamental matrix. 
This method, ORSA, shows improved robustness compared 
to a classic RANSAC method. In the context of foreground 
detection in video, [22J proposed an a contrario method for 
discriminating foreground from background pixels that was 
later refined by |29|. Even though this problem has some 
points in common with stereo matching, it is in a way less 
strict, since it only needs to learn to discriminate two classes 
of pixels. Hence they do not need to resort to image blocks, 
but rely only on a 5 dimensional feature vector composed of 
the color and motion vector of each pixel. 

Among influential related works, Robin et al. |32| describe 
a method for change detection in a time series of Earth 
observation images. The change region is defined as the 
complement of the maximal region where the time series does 
not change significantly. Thus, what is controlled by the a 
contrario method is the number of false alarms (NFA) of the 
no-change region. This method can therefore be regarded as 
an a contrario region matching method. It is fundamentally 
different from the method we shall present. Indeed, Robin's 
method assumes (in addition to the statistical background 
model) a statistical image model that the time series follows 
in the regions where no change occurs, which is not feasible 
in stereo matching. 

The method in ll27l is also worth mentioning. It is an 
a contrario method for detecting similar regions between 
two images. This method is a classic statistical test rather 
than an a contrario detection method in the sense of O. 
Indeed, the role of the background model {Hq hypothesis) 
and the structure to be tested {H\ hypothesis) are reversed: 
This method only controls the false negative rate and not 
the false positive rate (as in typical a contrario methods). 
Furthermore the significance level of the statistical test is set to 
a ^ 0. 1 in accordance with classical statistical testing, whereas 
as demonstrated in 1 6 | the significance level can be made much 
more secure, of the order of 10~^. 

The a contrario model for region matching in stereo vision 
used in 1 12| is simple and efficient. The gradient orientations 
at all region pixels are assumed independent and uniformly 
distributed in the background model. A more elaborate ver- 
sion learns the probability distribution of gradient orientation 
differences under the hypothesis that the disparity (or motion) 
is zero, and uses this distribution as a background model. Still, 
pixels are all considered as independent under the background 
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model. Once this background model is learned, a given dis- 
parity (or motion model) is considered as meaningful if the 
number of aligned gradient orientations is sufficiently large 
within the tested region. This region matching method works 
well, but requires an initial over- segmentation of the gray- 
level image which is later refined by an a contrario region 
merging procedure. Because of the rough background model, 
false positive region matches can be observed. 

The key to a good background or a contrario model in 
block-matching would be to learn a realistic probability dis- 
tribution of the high-dimensional space of image patches. The 
seminal works | 25 1 and |4| in the context of shape matching 
(where shapes are represented as pieces of level lines of a 
fixed size) showed that high-dimensional shape distributions 
can be efficiently approximated by the tensor product of 
(well chosen) marginal distributions. The marginal laws are 
one-dimensional, and therefore easily learned. In |26| these 
marginals are learned along the orientations of the principle 
components. The present work can be viewed as an extension 
of this curve matching method to block-matching. 

EJ proposed an alternative way of choosing detection 
thresholds such that the number of false detections under a 
given background model is ensured to stay below a given 
threshold. The procedure does not require analytical compu- 
tations or decomposing the probability as a tensor product 
of marginal distributions. Instead, detection thresholds are 
learned by Monte-Carlo simulations in a way that ensures 
the target NFA rate. This method, that was developed in the 
context of image segmentation, involves the definition of a set 
of thresholds to determine whether two neighboring regions 
are similar. However, as in l|27l , the detected event whose false 
positive rate is controlled is "the two regions are different/' 
and not the one we are interested in in the case of region 
matching, namely ''the two regions are similar" 

In conclusion, the a contrario methodology is expanding 
to many matching decision rules, but does not seem to have 
been previously applied to the block-matching problem. We 
shall now proceed to describe the a contrario or background 
model for block-matching. The proposed model is the simplest 
that worked, but the reader may wonder if a still simpler model 
could actually work. In the next section we analyze a list of 
simpler proposals, and we explain why they must be discarded. 

B. Choosing an Adequate A Contrario Model for Patch Com- 
parison. 

The goal of this section is is to reject simpler alternatives to 
the probabilistic block model that will be used. In recent years, 
patch models and patch spaces are becoming increasingly 
popular. We refer to [19] and references therein for algorithms 
generating sparse bases of patch spaces. Here, our goal can 
be formulated in one single question, that clearly depends on 
the observed set of patches in one particular image and not on 
the probability space of all patches. The question is: 
''What is the probability that given two images and two similar 
patches in these images, this similarity arises just by chance?'' 
The "just by chance" implies the existence of a stochastic 
background model, often called the a contrario model. 



When trying to define a well suited model for image 
blocks, many possibilities open up. Simple arguments show, 
however, that over-simplified models do not work. Let H be 
the gray-level histogram of the second image The simplest a 
contrario model of all might simply assume that the observed 
values are instances of i.i.d. random variables J^'(x) with 
cumulative distribution H. This would lead us to affirm that 
pixels q in image / and q' in image / are a meaningful match 
if their gray level difference is unlikely small, 

P[|/(q)-=^'(q')| < |/(q)-/'(q')l := 6] < tt^- 

^^tests 

As we shall see later, the number of tests Ntests is quite large 
in this case {Ntests ^10^ for typical image sizes), since it 
must consider all possible pairs of pixels (q,q') that may 
match. But such a small probability can be achieved (assume 
that H is uniform over [0,255]) only if the threshold B = 
|/(q) -/(qOI < 128 • 10"^. On the other hand, |/(q) -/(qO| 
cannot be expected to be very small because both images 
are corrupted by noise, among other distortions. Even in a 
very optimistic setting, where there would be only a small 
noise distortion between both images (of about 1 gray level 
standard deviation), such a small difference would only happen 
for about a tiny proportion (3.2* 10~^) of the correct matches. 

This means that a pixel-wise comparison would require an 
extremely strict detection threshold to ensure the absence of 
false matches, but this leads to an extremely sparse detection 
(about thirty meaningful matches per mega-pixel image). This 
suggests that the use of local information around the pixel is 
unavoidable. 

The next simplest approach could be to compare blocks of 
a certain size ^/sx ^/s with the norm, and with the same 
background model as before. Thus, we could declare blocks 
5q and as meaningfully similar if 



— L |/(q + x)-^'(q'- 



— L |/(q + x)-/V + x)|2:=e 



< 



(1) 



where is the block of size ^/s x ^ centered at the 
position (0,0). Now the test would be passed for a more 
reasonable threshold {B = 6,28,47 for blocks of size 3x3, 
5 X 5, 7 X 7 respectively), which would ensure a much denser 
response. However, this a contrario model is by far too naive 
and produces many false matches. Indeed, blocks stemming 
from natural images are much more regular than the white 
noise generated by the background model. Considering all 
pixels in a block as independent leads to overestimating 
the similarity probability of two observed similar blocks. It 
therefore leads to an over-detection. 

In order to fix this problem, we need a background model 
better reflecting the statistics of natural image blocks. But 
directly learning such a probability distribution from a single 
image in dimension 81 (for 9x9 blocks) is hopeless. 

Fortunately, as pointed out in f25l, high-dimensional dis- 
tributions of shapes can be approximated by the tensor prod- 
uct of their adequately chosen marginal distributions. Such 



4 




Fig. 1. Left: Reference image of a stereo pair of images. Right: the nine 
first principal components of the 7x7 blocks. 

marginal laws, being one-dimensional, are easily learned from 
a single image. Ideally, ICA (Independent Component Anal- 
ysis) should be used to learn which marginal laws are the 
most independent, but the simpler PCA analysis will show 
accurate enough for our purposes. Indeed, it ensures that the 
principal components are decorrelated, a first approximation 
to independence. Fig. [2] gives a visual assessment of how 
well a local PCA model simulates image patches in a class. 
Nevertheless, the independence assumption will be used as a 
tool for building the a-contrario model. This independence is 
not an empirical finding on the set of patches. 

C. Plan 

Section HIl introduces the stochastic block model learned 
from a reference image. Section HFbI presents the a contrario 
method applied to disparity estimation in stereo pairs and 
treats the main problem of deciding whether two pixels match. 
Theorem [T] is the main result of this section, ensuring a 
controlled number of false detections. Section [Till tackles 
the stroboscopic problem by a parameterless method, and 
demonstrates the necessity and complementarity of the a 
contrario and self- similarity rejections. Experimental results 
and comparison with other methods are in Section |lVl Section 
IVl is conclusive. An appendix summarizes the algorithm and 
gives its complete pseudo-code. 

II. The a contrario MODEL FOR Block-Matching 

We shall denote by q=(^i,^2) a pixel in the reference 
image / and by 5q a block centered at q. To fix ideas, the 
block will be a square throughout this paper, but this is by no 
means a restriction. A different shape (rectangle, disk) would 
be possible, and even a variable shape. Given a point q and its 
block 5q in the reference image, block-matching algorithms 
look for a point q' in the second image / whose block is 
similar to 5q. 

A. Principal Component Analysis 

For building a simple a contrario model the principal 
component analysis can play a crucial role, as shown in 
(261. Indeed, it allows for effective dimension reduction and 
decorrelates these dimensions, giving a first approximation to 
independence. This facilitates the construction of a probabilis- 
tic density function for the blocks as a tensor product of its 
marginal densities. Let 5q be the block of a pixel q in the 
reference image and (x^,...,x?) the intensity gray levels in 



5q, where s is the number of pixels in 5q. Let n be the 
number of pixels in the image. Consider the matrix X = {x\) 
l<i<s^ l<j<n consisting of the set of all data vectors, one 
column per pixel in the image. Then, the co variance matrix of 
the block is C = E(X — xl)(X — xl)^, where x is the column 
vector of size ^ x 1 storing the mean values of matrix X 
and 1 = (1, •••,!) a row vector of size I x n. Notice that x 
corresponds to the block whose ^-th pixel is the average of all 
^-th pixels of all blocks in the image. Thus, x is very close to 
a constant block, with the constant equal to the image average. 
The eigenvectors of the covariance matrix are called principal 
components and are orthogonal. They give the new coordinate 
system we shall use for blocks. Fig. [T] shows the first principal 
blocks. 

Usually, the eigenvectors are sorted in order of decreasing 
eigenvalue. In that way the first principal components are the 
ones that contribute most to the variance of the data set. By 
keeping the first N <s components with larger eigenvalues, the 
dimension is reduced but the significant information retained. 
While this global ordering could be used to select the main 
components, a local ordering for each block will instead 
be used for the statistical matching rule. In other words, 
for each block, a new order for the principal components 
will be established given by the corresponding ordered PCA 
coordinates (the decreasing order is for the absolute values). 
In that way, comparisons of these components will be made 
from the most meaningful to the least meaningful one for this 
particular block. 

Each block is represented by N ordered coefficients 
(%(i)(q),---,%(iv)(q)). where ^(q) is the resulting co- 
efficient after projecting Bq onto the principal component 
/ G { 1 , . . . , 5-} and Gq the permutation representing the final 
order when ordering the absolute values of components for 
this particular q in decreasing order. By a slight abuse of 
notation we will write c/(q) instead of C(j^(^i){q) knowing that 
it represents the local order of the best principal components. 
But notice that (7q(l) = 1 for most q because of the dominance 
of the first principal component. Moreover notice that this first 
component has a quite different coefficient histogram than the 
other ones (see Fig. O, because it approximately computes a 
mean value of the block. Indeed, the barycenter of all blocks 
is roughly a constant block whose average grey value is the 
image average grey level. The set of blocks is elongated in 
the direction of the average grey level and, therefore, the 
first component computes roughly an average grey level of 
the block. This explains why the first component histogram is 
similar to the image histogram. 

B. A Contrario Similarity Measure between Blocks 

Definition 1 (A contrario model): We call a contrario 
block model associated with a reference image a random block 
B described by its (random) components B = (Ci, . . . ,€5) on 
the PCA basis of the blocks of the reference image, satisfying 

• the components C^, / = 1, ... ,5- are independent random 
variables; 

• for each /, the law of is the empirical histogram of the 
/-th PCA component c/(-) of the blocks of the reference 
image. 
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C. Robust Similarity Distance 



(a) 



(b) 



Fig. 2. (a) Patches of the reference image, chosen at random, (b) Simulated 
random blocks following the law of the reference image. This experiment 
illustrates the (relative) adequacy of the a contrario model. Nevertheless, the 
PCA components are empirically uncorrected, but of course not independent. 



The reference image will be the secondary image I' . Fig. [2] 
shows patches generated according to the above a contrario 
block model and compares them to blocks picked at random 
in the reference image. The a contrario model will be used 
for computing a block resemblance probability as the product 
of the marginal resemblance probabilities of the in the 
a contrario model, which is justified by the independence 
of and Cj for i ^ j. There is a strong adequacy of the 
a contrario model to the empirical model, since the PCA 
transform ensures that C; and Cj are uncorrected for / ^ j, a 
first approximation of the independence requirement. 

We start by defining the resemblance probability between 
two blocks for a single component. Denote by //;(•) := 
Hi{ci{-)) the normalized cumulative histogram of the i-th PCA 
block component c/(-) for the secondary image /. 

Definition 2 (Resemblance probability): Let Bq be a block 
in / and B^f a block in Define the probability that a random 
block B of / resembles Bq as closely as Bqf does in the i-th 
component by 



pi 



qq' 



ifHi{q^)-Hi{q)>Hi{q); 

ifHi{q)-Hi{q^)>l-Hi{q) 

otherwise. 



Fig. [3] illustrates how the resemblance probability p^qq/ 
computed and Fig. |4] shows empirical marginal densities. 



IS 



Hi(q); 
Hi(q')^ 



The first principal components of Bq, being in decreasing 
order, contain the relevant information on the block. Thus, if 
two blocks are not similar for one of the first components, 
they should not be matched, even if their next components are 
similar. Due to this fact, the components of Bq and another 
block Bqf must be compared with a non-decreasing exigency 
level. In addition, in the a contrario model, the number of 
tested correspondences should be as small as possible to 
reduce the number of false alarms. A quantization of the 
tested resemblance probabilities is therefore required to limit 
the number of tests. 

These two remarks lead to define the quantized resemblance 
probability as the smallest non-decreasing sequence of^quan- 
tized probabilities bounding from above the sequence p\q'. 

Definition 3 (Quantized probability): Let Bq be a block in 
/. Let n := {ttj = 1/2^"^}^=!^... g be a set of quantized 
probability thresholds and let 

^ '•= {p = {pu---^Pn) I T^/^n, Pi ^ pj if i<j} 

be the family of non-decreasing A/^- tuples in H^, endowed 
with the order a ^ b if and only if ai ^ bi for all /. The 
quantized probability sequence associated with the event that 
random block B resembles Bq as closely as Bqf does in the 
ith component is defined by 



N = inf\t 



(2) 



Notice that the infimum {Pqqfi-- - iPqqf) is uniquely defined 
and belongs to T. Put another way the quantized probability 

" q q' ' ' ' ' ' ^ q q' 

resemblance probabilities {p^qq', - -^P^qq') that can be found 
in T. Fig. [5] illustrates the quantized probabilities in two cases. 



vector {Paa'^ - • • ^Paa') Smallest upper bound of the 




Ci(q') Ci(q) 



Fig. 5. Two examples of probabilities with 2 = 5 and N ^9. The probability 
thresholds are in ordinate and the features in abscissa. The resemblance 
probabilities are represented with small crosses and quantized probabilities 
with small squares. The example on the left has a final probability of 
1/(162-82 -2). The right example has the same resemblance probabilities 
excepting for features 1 and 2, but the final probability is 1/2. Only the 
configuration on the left corresponds to a meaningful match. 



Fig. 3. Normalized cumulative histogram of i-th PCA coordinates of the 
secondary image. c/(q) is the i-th PCA coordinate value in the first image. 
The resemblance probability p'qq/ for the i-th component is twice the distance 
\Hi{q) —Hi{q')\ when Hi{q) is not too close to the values or 1. 



Proposition 1 (Quantized resemblance probability): Let 
Bq e I and Bqf be two blocks. Assume the principal 
components / G {1,2,...,^} are reordered so that 
ki(q)| > k2(q)| ^ ••• > k.(q)|. The probability of the 
event "the random block B has its N first components as 
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Fig. 4. Histogram of the reference image, followed by the first five histograms of the block PC A coordinates. The first principal component roughly computes 
a mean of the block, which explains why its histogram is so similar to the image histogram. 



similar to those of Bq as to those of 5q/" is 

This is a direct consequence of Def . [H the principal compo- 
nents of B being independent. The resemblance probability 
is the product of the marginal resemblance probabilities. As 
classic in statistical decision, we could stop and use the 
above resemblance probability. But, despite having a low 
resemblance probability for each Pr^^f, the large number of 
resemblance tests allows for a very large number of false 
matches. Our next goal therefore is to define a number of 
false alarms, and not a probability, as the right criterion. To 
this aim, we need to estimate the number of tests. 

D. Number of Tests 

The number of tests for comparing all the blocks of image 
/ with all the blocks in image is the product of three factors. 
The first one is the image size #/. The second is the size of 
the search region denoted by (Z I' . We mentioned before 
that the search is done on the epipolar line. In practice, a 
segment of this line is enough. If q = {qi^qi) is the point of 
reference it is enough to look for q' = {q'l^qi) ^ 1' such that 
q\ G [q\ — R^qi ^ R] where 7? is a fixed integer larger than 
the maximal possible disparity. The third and most important 
factor is the number of different non-decreasing probability 
distributions FC^ q = #T that can be envisaged. Of course not 
all of these tests are performed, but only the one indicated by 
the observed block B^f. Yet, the choice of this unique test is 
steered by an a posteriori observation, while the calculation of 
the expectation of the number of false alarms (NFA) must be 
calculated a priori. Thus we must compute the NFA as though 
all comparisons for all quantized decreasing probabilities were 
effectuated. A test can never be defined a posteriori, it cannot 
be steered by the observation. Thus the number of tests is not 
the number of tests effectively performed. There are #T ways 
each couple of blocks could a priori be compared. In other 
terms #T different distances are a priori tested. Theorem 1 
will ultimately justify the following definition. 

Definition 4 (Number of tests): With the above notation we 
call the number of tests for matching two images / and / the 
integer Ntest = #/ • • #T = ^ (27? + 1 ) FCn^q • 

Lemma 1: With the above notation, 

where 

FCn^q := #{/ : [1,A^] ^ [1, G] | f{x) ^ f{y), Vx < j}. 



In order to prove this result we write 

FCn,q := #{/ : [1,A^] ^ [1,2] | /(I) = 1, f{N) = Q- 

e-i 

Since FCj^^q = ^ + \ )FCjq^Q-t and 

FCn.q = the result follows. 

We are now in a position to define a number of false alarms, 
which will control the overall number of false detections on 
the whole image. 

Definition 5 (Number of false alarms): Let Bq G I and 
Bqf e I' be two observed blocks. Assume the principal compo- 
nents / G { 1 , 2, . . . , ^-j are reordered so that \c\ (q) | ^ |t^2(q) | ^ 
• • • ^ I - We define the Number of False Alarms associated 
with event "the random block B has its N first components as 
similar to those of Bq as those of Bqi are" by 

N 

NFAq q> = Ntest ' Prqq' = Ntest ' fl^qq'' 

i=\ 

where Ntest comes form Def. |4] and Prqqi is the probability 
that the random block B have its first N PCA components as 
similar to those of Bq as those of Bqi are (Prop. [TJ. 

Definition 6 (e-meaningful match): A pair of pixels q and 
q' in a stereo pair (/,/') is an e-meaningful match if 

NFAqq. ^ £ . (5) 

E. The Main Theorem 

As it is computed above the NFA dimensionality is that of 
a number (of false alarms) per image. An alternative would 
be to measure the NFA as a number of false alarms per 
pixel, in which case the number of tests would not contain 
the cardinality of the image factor #/. With the proposed 
NFA, it is up to the users to decide which number of false 
alarms per image they consider tolerable. The NFA of a match 
actually gives a security level: the smaller the NFA, the more 
meaningful the match intuitively is. But Thm. [T] will give the 
real meaning of the NFA. To state it, we will use a clever trick 
used by Shannon in his information theory |37|, page 22-23, 
namely to treat the probability of an event as random variable 
and to play with its expectation. Here the NFA will become a 
random variable, replacing Bqf with B in its definition. 

In the a contrario model, each comparison of Bq with 
some Bqf is interpreted as a comparison of Bq to a trial of 
the random block model B. In total, Bq is compared with 
2R-\-lotherblocks for each q G /. So, we are led to distinguish 
for each q (27? +1) trials which are as many i.i.d. random 
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blocks B^'^', 7 G {1,2,... 27?+ 1}, all with the same law as 
B. They model a contrario the (27?+ 1) trials by which 5q 
is matched to (27?+ 1) blocks in I' . We are interested in the 
expectation of the number of such trials being successful (i.e. 
e-meaningful), "just by chance." 

Consider the event E^^ j that a random block B^'^ in the a 
contrario model with reference image I' meaningfully matches 
5q. If this happens, it is obviously a false alarm. We shall 
denote by X^iJ the random characteristic function associated 
with this event, with the convention that Xc^j = 1 if j is true, 
Xc^j = otherwise. Similarly NFAq j and p^^ j are the NFA and 
quantized probabilities associated with the event Eqj. 

Theorem 1: Let T = ^qeije{i,...,2R^\}XqJ be the random 
variable representing the number of occurrences of an £- 
meaningful match between a deterministic patch in the first 
image and a random patch in the second image. Then the 
expectation of F is less than or equal to £. 
Proof: 

We have 



if NFAqj ^ e; 
if NFAqj > £. 



_ / 1, i 
~ \ 0, i 

Then, by the linearity of the expectation 

E[r]=£E[Xqj]=£P[A^FAqj<e]. 

The probability inside the above sum can be computed by 
Definitions [5] and [2 



F[NFAqj^£] 



There are many probability A/^- tuples p = {Pqj)i=i^...^N Per- 
mitting to obtain the inequality inside the above probability. 
Nevertheless, the probabilities having been quantized, we can 
reduce it to a (non-disjoint) union of events, namely all /? G T 
such that Hi Pi ^ ^ latest- By the Bonferroni correction the 
considered probability can be upper-bounded by the sum of 
their probabilities sum. In addition the intersection below 
involves only independent events according to our background 
model. Thus 



Nte 



u nip^^Pi) 

peT i 

UiPi^£/Ntest 

E Up' 

peT i 

UiPi^£/Ntest 

£ 



where we have also used Ntests = #/#5''#T. So we have shown 
that 



qj 



qj 



£. 



The £ parameter is the only legitimate parameter of the 
method, the other ones namely the block size ^/s, the number 



of principal components N and the number of quantized 
probability thresholds Q can be fixed once and for all for 
a given SNR (Signal to Noise Ratio). All experiments are 
made with a common SNR, but a lower SNR would allow 
smaller blocks and consequently a different set of parameters. 
The question of how many false alarms should be acceptable 
in a stereo pair depends on the size of the images. In all 
experiments with moderate size images, of the order of 10^ 
pixels, the decision was to fix e = 1. Thanks to Theorem [T] this 
means that it is expected to find one false alarm in average 
for images with 10^ pixels. Then, fixing £ makes the method 
into a parameterless method for all moderately sized images. 

III. The Self-Similarity Threshold 

Urban environments contain many periodic local structures 
(for example the windows on a fagade). Since, in general, the 
number of repetitions is insignificant with respect to the num- 
ber of blocks that have been used to estimate the empirical a 
contrario probability distributions, the a contrario model does 
not learn this repetition, and can be fooled by such repetitions, 
thus signaling a significant match for each repetition of the 
same structure. Of course, one of those significant matches is 
the correct one, but chances are that the correct one is not also 
the most significant. In such a situation two choices are left: 
( i) try to match the whole set of self- similar blocks of / as a 
single multi-block (typically, global methods such as graph- 
cuts do that implicitly); or (ii) remove any (probably wrong) 
response in the case where the stroboscopic effect is detected. 
The first alternative would lead to errors anyway, if the similar 
blocks do not have the same height, or if some of them are 
out of field in one of the images. Fortunately, stereo pair 
block-matching yields a straightforward adaptive threshold. A 
distance function d between blocks being defined, let q and q' 
be points in the reference and secondary images respectively 
that are candidates to match with each other. The match of q 
and q' will be accepted if the following self- similarity (SS) 
condition is satisfied: 



diBq.Bqi) <min{d{Bq,Br) \ reinS{q)} 



(6) 



where S{q) = [q\—R^q\ +7?] + 1, — 1} and 7? is the 

search range. As noted earlier, the search for correspondences 
can be restricted to the epipolar line. This is why the automatic 
threshold is restricted to 5'(q). The distance used in the self- 
similarity threshold is the sum of squared differences (SSD) 
of all the pixels in the block and the block size is the same 
than the block size use for ACBM. 

Computing the similarity of matches in one of the images 
is not a new idea in stereo vision. In ll^ the authors define 
the distinctiveness of an image point q as the perceptual 
distance to the most similar point other than itself in the 
search window. In particular, they study the case of the auto- 
SSD function (Sum of Squared Differences computed in the 
same image). The flatness of the function contains the expected 
match accuracy and the height of the smallest minimum of the 
auto- SSD function beside the one in the origin gives the risk of 
mismatch. They are able to match ambiguous points correctly 
by matching intrinsic curves |39|. However, the proposed 
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algorithm only accepts matches when their quality is above 
a certain threshold. The obtained disparity maps are rather 
sparse and the accepted matches are completely concentrated 
on the edges of the image. According to |34|, the ambiguous 
correspondences should be rejected. In this work a new 
stability property is defined. This property is one condition 
a set of matches must satisfy to be considered unambiguous 
at a given confidence level. The stability constraint and the 
tuning of two parameters permits to take care of flat or 
periodic autocorrelation functions. The comparison of this last 
algorithm with our results will be done in section Hv] 

A. A Contrario vs Self -Similarity 

Is the self- similarity (SS) threshold really necessary? One 
may wonder whether the a contrario decision rule to accept 
or reject correspondences between patches would be sufficient 
by itself. Conversely, is the self- similarity threshold enough to 
reject false matches in a correlation algorithm? This section 
addresses both questions and analyzes some simple examples 
enlightening the necessity and complementarity of both tests. 
For each example we are going to compare the result of the a 
contrario test and the result of a classic correlation algorithm 
combined with the self- similarity threshold alone. 

First consider two independent Gaussian noise images (Fig. 
[6]). It is obvious that we would like to reject any possible 
match between these two images. As expected, (this is a 
sanity check!) the a contrario test rejects all the possible 
patch matches. On the other hand, the correlation algorithm 
combined with the self- similarity is not sufficient: many false 
matches are accepted. 




(a) (b) (c) 

Fig. 6. (a) Reference noise image, (b) No match at all has been accepted 
by the a contrario test! (c) Many false correspondences have been accepted 
by the self-similarity threshold. 

The second comparative test is about occlusions. If a point 
of the scene can be observed in only one of the images of 
the stereo pair, then an estimation of its disparity is simply 
impossible. The best decision is to reject its matches. A good 
example to illustrate the performance of both rejection tests 
ACBM and SS is the map image (Middlebury stereo vision 
database. Fig. (T]) which has a large baseline and therefore 
an important number of occluded pixels. ACBM gives again 
the best result (see Table U. The table indicates that the self- 
similarity test only removes a few additional points. Yet, even 
if the proportion of eliminated points is tiny, such mismatches 
can be very annoying and the gain is not negligible at all. 

The a contrario methodology cannot detect the ambiguity 
inherent in periodic patterns. Indeed, periodicity certainly does 
not occur "just by chance." The match between a window and 




Fig. 7. (a) Reference image (b) Secondary image. The rectangular object 
occludes part of the background (c) The a contrario test does not accept any 
match for pixels in the occluded areas, (d) With the self-similarity threshold 
the disparity map is denser, but wrong disparities remain in the occluded 
region. 





Bad matches 


Total matches 


SS 


3.35% 


85.86% 


ACBM 


0.37% 


64.85% 


ACBM-kSS 


0.36% 


64.87% 



TABLE I 

Quantitative comparison of several algorithms on 
Middlebury' s Map image: the block-matching algorithm with 

THE SELF-SIMILARITY THRESHOLD (SS), THE COntrario ALGORITHM 
(ACBM) AND THE ALGORITHM COMBINING BOTH (ACBM-hSS). THE 
PERCENTAGE OF MATCHES FOR EACH ALGORITHM IS COMPUTED IN THE 
WHOLE IMAGE AND AMONG THESE THE NUMBER OF WRONG MATCHES IS 

ALSO GIVEN. A MATCH IS CONSIDERED WRONG IF ITS DISPARITY 
DIFFERENCE WITH THE GROUND TRUTH DISPARITY IS LARGER THAN ONE 

PIXEL. 



another identical window on a building facade is obviously 
non casual and is therefore legally accepted by an a contrario 
model. In this situation, the self- similarity test is necessary. 
A synthetic case has been considered in Fig. O where the 
accepted correspondences are completely wrong in the a 
contrario test for the repeated lines. On the contrary, the self- 
similarity threshold is able to reject matches in this region of 
the image. 

In short, ACBM and SS are both necessary and comple- 
mentary. SS only removes a tiny additional number of errors, 
but even a few outliers can be very annoying in stereo. From 
now on, a possible match (q,q') will therefore be accepted 
only if it is a meaningful match (ACBM test in Def. [6]) and 
satisfies the SS condition given by ©. 

IV. Comparative Results 

The algorithm parameters are identical for all experiments 
throughout this paper. The comparison window size is 
9x9, the number of considered principal components is 
9, the number of quantum probabilities is 5. The previous 
section showed how the proposed method (ACBM + SS) 
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(a) (b) (c) 

Fig. 8. (a) Reference image with a texture and a stripes periodic motif. The 
secondary image is a 2 pixels translation of the reference image. The obtained 
disparity map should be a constant image with value 2. (b) The a contrario 
test gives the right disparity 2 everywhere, except in the stripes region, (c) 
The repeated stripes are locally similar, so the self-similarity threshold rejects 
all the patches in this region. 



deals with noise, occlusions and repeated structures. The 
detection method is also adapted to quasi- simultaneous stereo 
from aerial or satellite images, where moving objects (cars, 
pedestrians) are a serious disturbance. Essentially, this is the 
same problem as the occlusion problem, but the occlusion is 
caused by camera motion in presence of a depth difference 
instead of object motion. Figure [9] shows a stereo pair of 
images of the city of Marseille (France). In both cases, 
several cars have changed position between the two images. 
They are duly detected. The shadow regions, which contain 
more noise than signal, have also been rejected. We have 
also compared our results with the Kolmogorov's graph cut 
implementation which rejects a posteriori incoherent 
matches and are labeled as occlusions. In these examples, 
graph cuts is able to reject some mismatches due to the 
moving objects in the scene but a lot of conspicuous errors 
remain in the final disparity map. Likewise, OpenCV's stereo 
matching algorithm (H fails completely on this kind of 
pairs, even though it obtains correct results in more simple 
examples like the one in figure [T) 

The proposed algorithm will now be compared with the non- 
dense algorithms of L34il . Ii40ii . |i41J and [24J, whose aims are 
comparable. All of these papers have published experimental 
results on the first Middlebury dataset I35l (Tsukuba, Saw- 
tooth, Venus and Map pair of images), on the non-occluded 
mask. These four algorithms compute sparse disparity maps 
and propose techniques rejecting unreliable pixels. We also 
show some additional comparison with the block matching 
method implemented in the OpenCV library version 2.2.0 (H, 
because it is possibly the most widely used one since it comes 
close to real-time performance. 

The authors of ll24ll compute an initial classic correlation 
disparity map and select correct matches based on the support 
these pixels receive from their neighboring candidate matches 
in 3D after tensor voting. 3D points are grouped into smooth 
surfaces using color and geometric information and the points 
which are inconsistent with the surface color distribution are 
removed. The rejection of wrong pixels is not complete, 
because the algorithm fails when some objects appear only 
in one image, or when occluded surfaces change orientation. 
A variation of the critical rejection parameters can lead to quite 
different results. 




Fig. 9. From top to bottom: reference image, secondary image, ACBM+SS 
disparity map, graph cuts disparity map, and OpenCV disparity map. In our 
disparity map, red points are points which haven't been matched. Notice that 
patches containing a moved car or bus haven't been matched. Poorly textured 
regions (shadows) where noise dominates have also been rejected. Red points 
in the graph cuts disparity map are rejected a posteriori and considered as 
occlusions. The graph cuts disparity map is denser and smoother but several 
mismatches appear in the low textured areas and regions with moving objects. 



||4Q1| detects and matches so called "dense features" which 
consist of a connected set of pixels in the left image and a 
corresponding set of pixels in the right image such that the 
intensity edges on the boundary of these sets are stronger 
than their matching error on the boundary (which is the 
absolute intensity difference between corresponding boundary 
pixels). They call this the "boundary condition." The idea is 
that even the boundary of a non textured region can give a 
correspondence. Then, each dense feature is associated with 
a disparity. The main limitation is the way dense features are 
extracted. They are extracted using a local algorithm which 
processes each scan line independently from the others. As a 
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result, top and bottom boundaries are lost. On the contrary, 
iSTll uses graph cuts to extract "dense features" (which of 
course does not necessarily imply a dense disparity map) thus 
enforcing the boundary conditions. The results in BOll are 
rather dense and the error rate is one of the most competitive 
ones. Yet these good results are also due to the particularly 
well adapted structure of the benchmark. Indeed, the Sawtooth, 
Venus and Map scenes consist of piecewise planar surfaces, 
with almost fronto-parallel surface patches. The ground truth 
of Tsukuba is a piecewise constant disparity map with six 
different disparities. 

Table |II] summarizes the percentage of matched pixels (den- 
sity) and the percentage of mismatches (where the estimated 
disparity differs by more than one pixel from the ground 
truth). This table reports first the result of ACBM+SS, whose 
error rate is very small and yields larger match densities than 
Sara's results |34|. To compare with other algorithms yielding 
denser disparity maps, the results of ACBM+SS have been 
densified by the most straightforward proximal interpolation 
(a 3x3 spatial median filter). Doing this, the match density 
rises significantly while keeping small error rates. Still, large 
regions containing poor textures, typically shadows in aerial 
imaging, are impossible to fill in because they contain no 
information at all. Besides the compared algorithms in Table 
ini |[T4l also published non-dense results for the Tsukuba image 
(error rate of 2.1% with a density of 45%) but since non-dense 
results on other images are not published it does not appear 
in our table. 

Fig. [TOl compares the ACBM+SS results with opencv, graph 
cuts and the Sara published results on the classic CMU 
Shrub paiiEI. Sara's disparity map has several mismatches and 
the ACBM+SS results are obviously denser. On the other 
hand, Kolmogorov's graph cut implementation is denser but 
the mismatches have risen considerably. OpenCVs disparity 
map is more dense than Kolmogorov's, and less dense than 
Sara's, but it has also the highest number of wrong matches. 
So, the proposed algorithm ACBM+SS has a better trade-off 
between density and mismatches. In the Kolmogorov graph 
cuts implementation the occlusions are detected, providing a 
non-dense disparity map. It is clear that detecting occlusions 
in real images is not enough to avoid mismatches. Another 
example is shown in Fig. [TTl where the almost dense disparity 
map obtained with graph cuts is compared with the ACBM+SS 
disparity map. The top left of the image gets by Graph Cuts a 
completely wrong disparity: the sky and the tree branches are 
clearly not at the same depth in the scene. This type of error 
is unavoidable with global methods. The depth of the smooth 
sky is inherently ambiguous. By the minimization process it 
inherits the depth of the twigs through which it is seen. 

An interesting question arises out of the comparative results 
about the duality error/density. We have seen that our algo- 
rithm gives very low error percentages with densities between 
40% and 90%. The parameter £ can be increased but then the 
error rate will rise. Our goal is to match with high reliably 
the points between two images and reject any possible false 
match. So the choice of one expected false alarm (e = 1) is a 

^ http ://vasc .ri. emu. edu/idb/html/j isct 



conservative choice but ensures a very small error percentage. 

Discussion on the other parameters: We have mentioned 
that the number of considered principal components N and the 
number of quantum probabilities Q can be increased without 
noticeable alteration of the results. In practice, the two values 
are chosen (for computational reasons) to the minimal values 
not affecting the quality of the result. They are fixed once and 
for all io N = 9 and 2 = 5 respectively. Another parameter 
is the search region size (27?+ 1) but it is easy to find since 
we only need R to be larger than the largest disparity in the 
image, which is a classic assumption in stereovision algorithms 
(in practice R can be estimated from the sparse matching of 
interest points that was previously obtained for the epipolar 
rectification step). Finally, the last parameter is the size of 
the block. We know that very small blocks are affected by 
image noise but at the same time, the bigger the block, the 
bigger the fattening error (also named adhesion error). This 
error becomes apparent at the object borders of the scene 
causing a dilation of their real size, which is proportional to the 
block- size. The fattening phenomenon is not the object of the 
this paper but different solutions have already been suggested 
to avoid it O. Fixing the size of the block to 9 x 9 seems 
to be a good compromise between noise and fattening for a 
realistic SNR conditions, ranging from 200 to 20 (the SNR is 
measured as the ratio between the average grey level and the 
noise standard deviation.) 

Computational time: For the sake of computational speed, 
the PCA basis is previously learnt on a set of representative 
images and stored once and for allH Then, this basis is used 
to compute all image coefficients. Notice that only the image 
coefficients of the second image need to be sorted in order 
to compute the resemblance probability between all possible 
matches. With our implementation, which is still not highly 
optimized for speed, an experiment with a pair of images of 
size 512 X 512 with disparity rang = [-5,5], takes 4.5 seconds 
running on a 2.4 GHz Intel Core 2 Duo processor. 
A similar experiment with the OpenCV stereo algorithm takes 
between 5 and 500 miliseconds. This is much closer to 
real-time requirements, but results are also much more data- 
dependent, producing good results in easy examples like the 
Middlebury pair, but much less dense and less reliable results 
than our method in more difficult scenes like shrub, marseille 
or even the stereo pairs provided with OpenCV. 

V. Conclusion 

Wrong match thresholds were, in our opinion, the principal 
drawbacks for block-matching algorithms in stereovision. The 
a contrario block-matching threshold, that was the principal 
object of the present paper, combined with the self- similarity 
threshold is able to detect mismatches systematically, by an 
algorithm which is essentially parameter- free. Indeed, the only 
user parameter is the expected number of false matches, which 
can be fixed once and for all in most applications. The method 
indiscriminately detects occlusions, moving objects and poor 
or periodic textured regions. 

^ In our experience the (computationally intensive) choice of this basis does 
not significantly affect the results, but the (computationally fast) learning of 
marginal distributions for a particular image on this basis does. 
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Tsukuba 


Sawtooth 


Venus 


Map 


iirror(7o) 


Uensity(vo) 


lirror(vo ) 


Uensity(vo) 


lirror(7o) 


Uensity(vo) 


lirror(vo ) 


Density(vo) 


AUrJiVl + 00 


0.31 


45.6 


0.09 


65.7 


0.02 


54.1 


0.0 


84.8 


ACBM + SS + Median filter 


0.33 


54.3 


0.14 


77.9 


0.0 


66.6 


0.0 


93.0 


Sara |34 | 


1.4 


45 


1.6 


52 


0.8 


40 


0.3 


74 


Veksler 02 |40 1 


0.38 


66 


1.62 


76 


1.83 


68 


0.22 


87 


Veksler 03 |40 1 


0.36 


75 


0.54 


87 


0.16 


73 


0.01 


87 


Mordohai and Medioni L24J 


1.18 


74.5 


0.27 


78.4 


0.20 


74.1 


0.08 


94.2 



TABLE II 

Quantitative results on the first Middlebury benchmark data set. The error statistics are computed on the mask of non 
occluded pixels. any error larger than 1 pixel is considered a mismatch. acbm+ss obtains less mismatches in all four images. 




(a) left image 





(e) Proposed algorithm 



(f) OpenCV SGBM 



Fig. 10. CMU Shrub scene, (a) and (b) Reference and secondary images, (c) 
Method of Sara f34l. Red points are rejected. Density: 24% (d) Kolmogorov's 
Graph-Cuts 1 16 |. Red points are points detected as occlusions. Density: 77% 
(e) ACBM+SS. Red points are rejected points. Density: 42%. Sara's disparity 
map has a lower density and has several evident mismatches. Kolmogorov's 
disparity map is denser but has many obvious errors, (f) The block matching 
algorithm included in OpenCV is also not very dense AND contains many 
errors. It is only provided as a reference of what can be easily obtained 
with a freely available quasi-real-time block matching algorithm, (e) Proposed 
method ACBM+SS. 



Mismatches in block-matching have led to the overall dom- 
inance of global energy methods. However, global methods 
have no validation procedure, and the proposed a contrario 
method must be viewed as a validation procedure, no matter 
what the stereo matching process was. Block-matching, to- 
gether with the reliability thresholds established in this paper, 
gives a fairly dense set of reliable matches (from 50% to 80% 
usually). It may be objected that the obtained disparity map 
is not dense. 

This objection is not crucial for two reasons. First, having 





Fig. 11. Flower-garden scene, (a) and (b) Reference and secondary images, 
(c) Graph Cuts (method of 1161). Red points are occluded points, (d) 
ACBM+SS. Red points are rejected points. Density: 59%. Most rejected points 
are obviously mismatched by the graph cut algorithm, which equates the 
depths of trees, sky and house. 



only validated matches opens the path to benchmarks based 
on accuracy, and to raise challenges about which precision can 
be ultimately attained (on validated matches only). Second, 
knowing which matches are reliable allows one to complete 
a given disparity map by fusing several stereo pairs. Since 
disposing of multiple observations of the same scene by 
several cameras and/or at several different times is by now 
a common setting, it becomes more and more important to be 
able to fuse 3D information obtained from many stereo pairs. 
Having almost only reliable matches in each pair promises an 
easy fusion. A straightforward solution in our case would be 
the following: Given m > 2 images, the disparity map between 
each possible pair of images is computed with ACBM+SS. 
Then the final disparity map is the accumulated disparity map 
considering all meaningful matches computed with all the 
image pairs whenever all the computed disparities for the same 
pixel are coherent. 
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